Obsesity is widely viewed as one of the greatest public health problems facing Americans today. Levels of obesity have been on the rise since the 1980s at an astounding rate and currently approximately 35% of the adult population is considered [obese] (http://www.cdc.gov/obesity/data/adult.html). Obesity is of particular concern as it is associated with other disease including diabetes, hypertension, and asthma. Much of the literature surrounding the issue attributes high levels of obesity to a lack of access to good food and a lack of education about what good food is. Increasing nutrition education, increasing the number of bike paths, and increasing the number of food vendors selling fresh fruits and vegetables are often proposed as solutions, however, these efforts alone often fail to reduce obesity [rates] (http://www.latimes.com/local/california/la-me-0510-south-la-food-20150510-story.html#page=1). Public health officials and policy makers must consider other, possibly deeper rooted causes of obesity and other health risks if they wish to enact lasting change. This project seeks to examine the rates of ER visits for asthma, diabetes, and hypertension as proxies for obesity and general community health in San Francisco by looking at race, class, and pollution as possible predictive factors. I chose to look at San Francisco as a model, because while it is considered one of the [healthiest cities] (http://www.forbes.com/sites/melaniehaiken/2014/05/30/the-20-healthiest-cities-in-america-2014/2/) it is also a city of great economic disparity and a city with neighborhoods that are often defined racially. Additionally, likely because SF is home more than its fair share of data analysts and “brogrammers” there was myriad free data available online.
Data was collected from three sources: Census data was collected from Social Explorer Income, education, PM2.5, and census tract shapefiles were collected from SF Open Data *Disease Rates were collected from SFHIP
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco,+CA&zoom=12&size=%20640x640&scale=%202&maptype=roadmap&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San+Francisco,+CA&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
Before delving into the variables used, this first map allows us to orient ourselves in San Francisco and get a feel for the different neighborhoods (defined by zip code). This map was compiled by associating zip codes with census tracts (unfortunately this was done manually because I could not find an effective dataset that could join them). Worth noting: (a) census tracts and zip codes do not play nicely so the boundaries of some of the neighborhoods are a little off; (b) some census tracts do include parts of the bay, so do not be surprised in later maps by some of the stranger shapes.
The above maps display the age-adjusted rates of ER visits per ten thousand people for asthma, diabetes, and hypertension. This data was collected on a neighborhood level and thus we see clearly defined neighborhoods. In particular, the neighborhood in the far-right, known as Bayview/Hunter’s Point, shows dramatically high levels of ER visits for all three diseases. The neighborhoods North of Bayview, North of Market, South of Market, and the Tenderloin, also show higher rates of ER visits. These maps suggest that there is interesting variation to the rates of disease in San Francisco that presents itself on a neighborhood level.
The above maps show the percentage of different races in the various census tracts in San Francisco. The racial makeup of each census tract also displays interesting geographic trends, as suspected. Higher populations of white people are seen in the North end of the city along the water and also clustering in the center of the city.
The black and African American community seems to cluster around Bayview/Hunter’s Point, and to a lesser degree North of Market.
The asian population is higher on the west and south sides of the city and also in the North East corner where China Town is located.
The “other” population, which in SF is largely going to be hispanic, is higher South of Market in the Mission and more generally in the Southern side of the city.
The racial definition of many neighborhoods, suggests that there could be correlation between disease rates and the racial makeup of an area.
The above maps are meant to explore the other possible factors of pollution levels and class. PM2.5 levels and population density were used as measures for pollution. PM2.5 levels refer to the amount of small particulate matter in the air including dirt, dust, soot, and [smoke] (http://www.epa.gov/pmdesignations/faq.htm).
The east side of San Francisco shows higher PM2.5 levels than the west side, likely due to the concentration of industry on the east side of the city. There are particularly high levels around Market Street.
Population density is fairly consistent throughout the city with higher levels appearing only in the area just North of Market.
Median income and Percent of population with some college education were used as measures for class. While there appears to be so geographic patterns, it is difficult to tell as this data came from an old census and so tracts had changed enough that there are several gaps in key areas of interest.
Thus, as the map of PM2.5 levels shows a clear geographic pattern, I will include it in my model in addition to race. Population density does not seem to display much of pattern and the class variables simply lack too much data for an effective analysis (additionally, when the regression was performed with this data, their confidence intervals included zero).
## (Intercept) perc.white perc.black perc.asian perc.other MEAN_PM
## 354.813677 -5.451635 -4.212618 -5.540540 -5.186759 24.554964
## 2.5 % 97.5 %
## (Intercept) -331 1041
## perc.white -12 1
## perc.black -11 3
## perc.asian -12 1
## perc.other -12 2
## MEAN_PM 19 30
## (Intercept) perc.white perc.black perc.asian perc.other
## -28.76240665 -0.33763662 0.05912898 -0.40388190 0.09954666
## MEAN_PM
## 8.49808936
## 2.5 % 97.5 %
## (Intercept) -312 255
## perc.white -3 2
## perc.black -3 3
## perc.asian -3 2
## perc.other -3 3
## MEAN_PM 6 11
## (Intercept) perc.white perc.black perc.asian perc.other MEAN_PM
## 99.850041 -1.480412 -1.062113 -1.512157 -1.328824 6.994639
## 2.5 % 97.5 %
## (Intercept) -135 334
## perc.white -4 1
## perc.black -3 1
## perc.asian -4 1
## perc.other -4 1
## MEAN_PM 5 9
I performed a linear regression analysis on the rates of ER visits for asthma, diabetes, and hypertension using the percentages of different races and the PM2.5 levels as variables. In all three regressions, only PM2.5 levels show a strong effect with confidence intervals that do not include zero. For asthma, an increase in 1 in PM2.5 levels corresponds to an increase in ER visits per 10000 of 24. For diabetes, it corresponds to an increase of 8 visits, and for hypertension it corresponds to an increase of 7 visits.
This data suggests that pollution levels and environmental quality are correlated with levels of disease rates. However, the maps of the residuals indicate that the model is severely lacking and that other variables are influencing the rates of Er visits for these diseases.
Overall, this project suggests that the quality of the air people live in is a strong predictor of the community’s health. However, it also suggests that there are other factors beyond race and PM2.5 levels that strongly influence community health. This project only barely scratches the surface and future studies into the underlying causes of disease rates in San Francisco could be valuable. I do have a couple of hypotheses regarding my results. One flaw with this study is that the health data was collected on the neighborhood level, while the other data was at a census tract level. Unfortunately, neighborhoods and census tracts don’t always line up. A more refined analysis could be performed using health data collected on a census tract level, however, that is not currently available for free on the internet. Another source of concern is the use of ER visits as a proxy for rates of disease. Lower income communities are more likely to visit the ER than to have frequent non-emergency visits to doctor. The use of this metric may have biased this data, though we did not see any clear patterns that linked the higher level sof ER visits and income. One possibility for future studies would be to look at the changes in the predictor variables and the rates of disease in San Francisco over times. San Francisco has been a huge site for gentrification in the last 20 or 30 years and looking if the disease rates have shifted as community make-up has shifted or if they have stayed fairly constant could inform us as to whether the diseases are a problem as a result of the environment or as a result of the people living there.